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Abstract 

Background: The pig is a biomedical model to study human and livestock traits. Many of these traits are controlled 
by neuropeptides that result from the cleavage of prohormones by prohormone convertases. Only 45 prohormones 
have been confirmed in the pig. Sequence homology can be ineffective to annotate prohormone genes in 
sequenced species like the pig due to the multifactorial nature of the prohormone processing. The goal of this 
study is to undertake the first complete survey of prohormone and prohormone convertases genes in the pig 
genome. These genes were functionally annotated based on 35 gene expression microarray experiments. The 
cleavage sites of prohormone sequences into potentially active neuropeptides were predicted. 

Results: We identified 95 unique prohormone genes, 2 alternative calcitonin-related sequences, 8 prohormone 
convertases and 1 cleavage facilitator in the pig genome 10.2 assembly and trace archives. Of these, 1 1 pig 
prohormone genes have not been reported in the UniProt, UniGene or Gene databases. These genes are 
intermedin, cortistatin, insulin-lil<e 5, orexigenic neuropeptide QRFP, prol<ineticin 2, prolactin-releasing peptide, paratliyroid 
liormone 2, urocortin, urocortin 2, urocortin 3, and urotensin 2-related peptide. In addition, a novel neuropeptide S was 
identified in the pig genome correcting the previously reported pig sequence that is identical to the rabbit 
sequence. Most differentially expressed prohormone genes were under-expressed in pigs experiencing immune 
challenge relative to the un-challenged controls, in non-pregnant relative to pregnant sows, in old relative to 
young embryos, and in non-neural relative to neural tissues. The cleavage prediction based on human sequences 
had the best performance with a correct classification rate of cleaved and non-cleaved sites of 92% suggesting that 
the processing of prohormones in pigs is similar to humans. The cleavage prediction models did not find 
conclusive evidence supporting the production of the bioactive neuropeptides urocortin 2, urocortin 3, torsin famiiy 
2 member A, tacliykinin 4, islet amyloid polypeptide, and calcitonin receptor-stimulating peptide 2 in the pig. 

Conclusions: The present genomic and functional characterization supports the use of the pig as an effective 
animal model to gain a deeper understanding of prohormones, prohormone convertases and neuropeptides in 
biomedical and agricultural research. 

Keywords: Prohormone, Prohormone convertase. Neuropeptide, Pig genome. Gene expression profile. Cleavage 



Background 

In addition to the importance in livestock production, the 
pig is a well-established biomedical model to study human 
health due to the genomic, anatomic and physiologic simi- 
larities with humans. A wide variety of human health 
traits including cancer, reproductive health, drug metabol- 
ism, wound healing, and cardiovascular disease have been 
successfully studied using the pig [1-4]. Underlying these 
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and other important traits such as growth and develop- 
ment, feeding, stress, memory and susceptibility to sub- 
stances abuse are neuropeptides, a class of cell-cell 
signaling peptides that have paracrine, endocrine, and 
autocrine effects [5,6]. Research in selected pig neuropep- 
tides has offered insights into cell transplantation, nervous 
system diseases, and injury [7]. For example, Yang et al. 
[8] reported on the inhibitory effect of neuromedin S 
(A/MS) on luteinizing hormone secretion which is 
mediated via melanocyte-stimulating hormone (MSH) neu- 
rons in the hypothalamus-pituitary axis of ovariectomized 
pigs. Kaminski et al. [9] concluded that hypothalamic 
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peptides, orexin A and orexin B, were involved in the con- 
trol of food intake, sleep patterns, autonomic and neuroen- 
docrine systems, and may also affect reproductive functions 
through the influence on the hypothalamic-pituitary- 
ovarian axis in pigs. 

The identification of neuropeptides is more difficult 
than most proteins because neuropeptides are derived 
from larger prohormone proteins as a result of complex 
enzymatic processing. The conversion of the large prohor- 
mone to one or multiple smaller neuropeptides involves 
cleavage by multiple prohormone convertases and add- 
itional post-translational modifications such as amidation 
and glycosylation [5]. This complex processing of prohor- 
mones into neuropeptides challenges the identification of 
neuropeptide genes across genomes solely based on se- 
quence homology to better understood species [5,6,10]. 

Only 40 prohormone and 2 prohormone convertase 
genes have been empirically confirmed in the pig com- 
pared to approximately 100 genes identified in human, 
rat, mouse, cow and rhesus monkey [11-13]. This partial 
knowledge of the prohormone, prohormone convertase 
and associated neuropeptides in the pig is a critical 
shortcoming, especially considering the important role 
of pig in biomedical and agricultural research [1]. In 
addition, few gene expression studies have discussed the 
expression profile of pig prohormone or prohormone 
convertase genes. Ross et al. [14] found that estradiol 
treatment at day 9 of gestation was associated with 
changes in the expression of the prohormone neuromedin 
{NMB) in the endometrium of glits. Hausman et al. [15] 
concluded that the expression of neuropeptide Y (NPY) 
was down-regulated with age in gilts ranging from 90 to 
210 days old. 

Understanding the role of neuropeptides in human 
and livestock traits using the pig as biomedical model 
requires a comprehensive knowledge of the neuropep- 
tide complement in the recently released pig genome 
(SScrofa 10.2vl8, [16]). This understanding includes the 
identification of prohormone and prohormone conver- 
tase genes, prediction of cleavage sites in prohormones 
that may result in potentially bioactive neuropeptides, 
and characterization of gene expression and protein 
abundance across conditions to gain insights into the 
role of neuropeptides. A complete survey of the prohor- 
mone and prohormone convertase genes in the pig sup- 
ports the interpretation of gene expression experiments 
and improves the effectiveness of tandem mass spec- 
trometry studies to identify neuropeptides [17-19]. Func- 
tional annotation of these genes can be obtained by the 
analysis of the large number of gene expression experi- 
ments already available [20,21]. To address the lack of a 
comprehensive understanding of the prohormone and 
prohormone convertase genes in the pig, we present the 
first comprehensive survey and functional annotation of 



these genes. An all-inclusive catalogue of prohormone 
and prohormone convertase genes known in other spe- 
cies was used to search complementary pig genome 
databases. These genes were then characterized by ana- 
lyzing a large number of gene expression experiments 
across a wide range of conditions. The potential cleavage 
sites of prohormones that can result in bioactive neuro- 
peptides were predicted and compared to the cleavages 
based on known neuropeptide sequences. 

Results and discussion 

Pig prohormone genes 

A comprehensive catalogue of 95 potential pig prohor- 
mone genes, 8 prohormone convertase genes and 1 pro- 
hormone convertase facilitator gene (762) were identified 
in the pig genome. Table 1 lists the genes and the corre- 
sponding BLAST matches on the pig Genome, UniProt, 
Gene and UniGene databases. There are 11 previously un- 
reported (i.e. without empirical confirmation) prohor- 
mone genes in the pig and complete sequences where 
only partially or incomplete sequences have been previ- 
ously reported. Newly identified genes are intermedin 
{ADM2), cortistatin (CORT), insulin-like 5 {INSL5), orexi- 
genic neuropeptide QRFP (0X26), prokineticin 2 (PR0K2), 
prolactin-releasing peptide {PRRFj, parathyroid hormone 2 
{TIP39), urocortin (UCNl), urocortin 2 {UCN2), urocortin 3 
{UCN3), and urotensin 2-related peptide {UTS2B). This 
search also identified two different calcitonin protein en- 
tries in public databases that are isoforms of other calci- 
tonin genes. Additional information on the comprehensive 
catalogue of genes is available [see Additional file 1]. The 
predicted prohormone protein sequences with cleavage 
sites identification is provided in the NeuroPred format 
[see Additional fOe 2]. 

Table 2 summarizes the evidence from various reposi- 
tories supporting the 95 unique pig prohormone genes 
and 2 alternative sequences detected in this study. The 
genome-predicted prohormone genes are supported by 
66 UniProt entries (Table 1) including 47 sequences that 
have evidence at the protein level of which 39 have 
complete protein sequence and 8 have partial protein se- 
quence. Additionally, 81, 91 and 19 prohormone genes 
detected are supported by transcript evidence from 
UniGene, Gene, and UniProt, respectively. Two genes, 
apelin (APEL) and PR0K2, are supported by evidence in 
UniProt and by EST records unassigned to a UniGene 
cluster. UniProt supporting information includes 2 entries 
from alternatively spliced calcitonin genes, preprocalcitonin 
gene-related peptide and calcitonin-2, that have evidence at 
the protein and transcript levels, respectively. However, the 
calcitonin 2 is not assigned to a UniGene cluster or 
NCBI Gene identifier because no EST matches the unique 
region of the reported sequence. The UniProt entry for 
prothyroliberin (TRH) refers only to the 3-amino acid 
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Table 1 Prohormone and convertase genes identified across pig genome resources 



TvDe^ 


Symbol 


Gene name 


Genome sequence^ 


UniGene*" 


UniProt'' 


Gene^ 


P 


ADM2 


intermedin 


complete 


Not Found 


Fl RXUl 


100517471 


P 


ADM 5 


Adrenomeduliin-5 


complete 


Ssc.26627 


A5LHG2 


100101476 


P 


ADML 


Adrenomeduliin 


complete 


Ssc.314 


P53366 


397195 


P 


ANF 


Atriai natriuretic factor 


complete 


Ssc.16245 


P24259 


397496 


P 


ANFB 


Natriuretic peptide B 


complete 


Ssc.629 


P07634 


396844 


P 


A NFC 


C-type natriuretic peptide 


complete 


Ssc.23867 


P18104 


493772 


P 


APEL 


Apeiin 


complete 


CU928865 


Not Found 


100625006 


P 


AUGN 


Augurin 


complete 


Ssc.22487 


F1SU23 


100512958 


P 


CALC 


Calcitonin/calcitonin gene-reiated peptide i 


complete 


Ssc.14052 


A6P7L6 


100125547 


P 


CALCalt 


Preprocalcitonin gene-reiated peptide 


complete 


Ssc.56129 


A6P7L7 


100124407 


P 


CART 


Cocaine- and amphetamine-regulated transcript protein 


complete 


Ssc. 15900 


Q307W6 


397252 


P 


CCKN 


Ctioiecystoi<inin 


complete 


Ssc.717 


P01356 


397468 


P 


CMGA 


Ctiromogranin-A 


complete 


Ssc.4653 


P04404 


397540 


P 


COLI 


Pro-opiomeianocortin 


complete 


Ssc.14556 


POl 192 


396863 


P 


CORT 


Cortistatin 


complete 


Not Found 


Fl RIF7 


1005261 12 


P 


CRF 


Corticoiiberin 


complete 


Ssc.69887 


P06296 


100127468 


P 


CRSP1 


Caicitonin receptor-stimuiating peptide i 


complete 


Ssc.3741 


Q862B1 


396563 


P 


CRSP2 


Caicitonin receptor-stimulating peptide 2 


complete 


Ssc.18558 


Q766Y7 


396574 


P 


CRSP3 


Calcitonin receptor-stimulating peptide 3 


complete 


Ssc.1 7879 


Q766Y6 


396573 


P 


CRSPSalt 


Calcitonin-2 


complete 


Not Found 


A0A761 


Not Found 


P 


EDNl 


Endottieiin- 1 


complete 


Ssc.9364 


P09558 


396915 


P 


EDN2 


Endottieiin-2 


complete 


Not Found 


Not Found 


Not Found 


P 


EDN3 


Endottieiin-3 


complete 


Ssc.31972 


A5A752 


1 00049663 


P 


GALA 


Galanin 


complete 


SscJI 3 


P07480 


397465 


P 


GALP 


Ga!anin-lil<e peptide 


complete 


Ssc.4875 


Q9TT95 


396772 


P 


CAST 


Gastrin 


complete 


Ssc.644 


P01351 


445524 


P 


GHRL 


Obestatin 


complete 


Ssc.440 


Q9GKY5 


396728 


P 


GIP 


Gastric inliibitory polypeptide 


complete 


Ssc.3871 3 


POl 281 


100621 1 1 7 


P 


GLUC 


Glucagon 


complete 


Ssc.1 7225 


POl 274 


397595 


P 


GO/v; 


Progonadoliberin- 1 


complete 


Ssc.16310 


P49921 


397516 


P 


G0N2 


Progonadoliberin-2 


Not Found 


Not Found 


F1S8B1 


100523475 


P 


GRP 


Gastrin-releasing peptide 


complete 


Ssc.1 3923 


P63153 


Not Found 


P 


HEPC 


Hepcidin 


complete 


Ssc.376 


Q8MJ80 


397207 


P 


lAPP 


islet amyloid polypeptide 


complete 


Ssc.8324 


Q291 19 


100520838 


P 


IGFl 


Insulin-like growtti factor 1 


complete 


Ssc.16231 


PI 6545 


397491 


P 


IGF2 


Insulin-like growth factor 2 


fragment 


Ssc.9355 


P23695 


396916 


P 


INS 


Insulin 


complete 


Ssc.583 


P01315 


397415 


P 


INSL3 


Insulin-like 3 


complete 


Ssc.1 1990 


P51461 


397024 


P 


INSL5 


Insulin-like 5 


complete 


Not Found 


Not Found 


100620109 


P 


INSL6 


Insulin-like 6 


complete 


Ssc.46919 


F1SK47 


100158105 


P 


KISSl 


iVfetastasis-suppressor KISS- 1 


complete 


Ssc.73565 


B5M447 


100145896 


P 


MCH 


Pro-melanin-concentrating hormone 


complete 


Ssc.3287 


Q9rrs8 


396962 


P 


MOT! 


IVIotilin 


complete 


Ssc.714 


POl 307 


397466 


P 


NEUl 


Oxytocin 


complete 


Ssc.1 5668 


POl 177 


100152272 


P 


NEU2 


Neurophysin-2 


complete 


Ssc.4210 


POl 183 


396995 


P 


NEUT 


Neurotensin 


complete 


Ssc.38680 


F1SPX3 


100739079 


P 


NMB 


Neuromedin-B 


complete 


Ssc.2083 


B0LUW4 


100141313 
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Table 1 Prohormone and convertase genes identified across pig genome resources (Continued) 



p 


NMS 


Neuromedin-S 


complete 


Ssc.12508 


C3UZJ1 


100294685 


p 


NMU 


Neuromedin-U 


complete 


Ssc.12508 


P34964 


100523263 


p 


NPB 


Neuropeptide B 


complete 


Ssc.82498 


Not Found 


Not Found 


p 


NPFF 


Neuropeptide FF 


complete 


Ssc.44958 


F1SFP1 


100518250 


p 


NPS 


Neuropeptide S 


complete 


Ssc.73596 


F1RSG4 


100188981 


p 


NPW 


Neuropeptide W 


complete 


Ssc.15796 


Q8MI35 


396680 


p 


NPY 


Neuropeptide Y 


complete 


Ssc.15981 


P01304 


397304 


p 


OREX 


Orexin 


complete 


Ssc.15983 


077668 


397305 


p 


OSTN 


Osteocrin (Musciin) 


complete 


Ssc.5148 


A5JHN9 


100049691 


p 


0X26 


Orexigenic neuropeptide QRFP 


complete 


Not Found 


F1S0X5 


100524361 


p 


PACA 


Pituitary adenylate cydase-activating polypeptide 


complete 


Ssc.27598 


P41535 


414283 


p 


PAHO 


Pancreatic polypeptide 


complete 


Ssc.456 


P01300 


397272 


p 


PCSKIN 


Proprotein convertase subtilisin/kexin type i inhibitor 


complete 


Ssc.17429 


Not Found 


100621697 


p 


PDQFA 


Platelet-derived growth factor alpha polypeptide 


complete 


Ssc.6173 


FIRIZO 


100519764 


p 


PDQFB 


Platelet-derived growth factor beta polypeptide 


complete 


Ssc.54182 


P20034 


100126843 


p 


PDQFD 


Platelet-derived growth factor D 


complete 


Ssc49835 


F1SV50 


100524161 


p 


PDYN 


Proenkephalin-B 


complete 


Ssc.121 


P01214 


445529 


p 


PENK 


Proenkephalin 


complete 


Ssc.11281 


Q7M3H2/Q7M2Z7 


100152093 


p 


PNOC 


Prepronociceptin 


complete 


Ssc.15910 


P55791 


397257 


p 


PR0K2 


Prokineticin 2 


fragment 


EW633867 


Not Found 


100526076 


p 


PRRP 


Prolactin-releasing peptide 


fragment 


Not Found 


Not Found 


Not Found 


p 


PTHR 


Parathyroid hormone-related peptide 


complete 


Ssc.9991 


Q866H2 


396951 


p 


PTHY 


Parathyroid hormone 


complete 


Ssc.668 


P01269 


399502 


p 


PYY 


Peptide YY 


complete 


Ssc.63650 


P68005 


445018 


p 


RELl 


Pro-relaxin 1 


complete 


Ssc.162 


P01348 


396891 


p 


REL3 


Relaxin 3 


complete 


Ssc42647 


Q8HY17 


503836 


p 


RES 18 


Regulated endocrine-specific protein 18 


complete 


Ssc49266 


F1SR77 


100154377 


p 


RFRP 


Neuropeptide VF precursor 


complete 


Ssc.75350 


C4P9W1 


100302024 


p 


scg; 


Secretogranin- 1 


complete 


Ssc.15718 


Q9GLG4 


397154 


p 


5CG2 


Secretogranin-2 


complete 


Ssc.13645 


Q5FZP5 


497237 


p 


5CG3 


Secretogranin-3 


complete 


Ssc.6770 


F1RYP7 


100154760 


p 


SECR 


Secretin 


complete 


Ssc.710 


P63298 


397464 


p 


SUB 


Somatoliberin 


complete 


Ssc.71374 


P01287 


100499556 


p 


SMS 


Somatostatin 


complete 


Ssc.19520 


P01168 


494469 


p 


SPXN 


Spexin 


complete 


Ssc.57764 


F1SR03 


100155886 


p 


TIP39 


Parathyroid hormone 2 


complete 


Not Found 


F1RHZ 


100515141 


p 


TKNl 


Tachykinin, precursor 1 


complete 


Ssc.18075 


F1SF85 


100525179 


p 


TKN4 


Tachykinin-4 


complete 


Ssc.23153 


F1RTB7 


100511101 


p 


TKNK 


Tachykinin 3 


complete 


Ssc.19565 


P67934 


492314 


p 


T0R2X 


Torsin family 2, member A 


fragment 


Ssc.67158 


B6VD08 


100519815 


p 


TRH 


Prothyroiiberin 


complete 


Not Found 


P62968 


100513309 


p 


UCN1 


Urocortin 


Traces 


Not Found 


F8R6K7 


Not Found 


p 


UCN2 


Urocortin 2 


complete 


Not Found 


F1SKM2 


100521865 


p 


UCN3 


Urocortin 3 


complete 


Not Found 


FIRYWO 


100737810 


p 


UTS2 


Urotensin 2 


complete 


Ssc437 


Q95J46 


397268 


p 


UTS2B 


Urotensin ll-related peptide 


complete 


Not Found 


F1SFH3 


100626084 


p 


VEGFC 


Vascular endothelial growth factor C 


complete 


Ssc.12790 


F1RT19 


100525960 


p 


VEGFD 


Vascular endothelial growth factor D 


complete 


Ssc.29289 


F1SQU4 


100155670 
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Table 1 Prohormone and convertase genes identified across pig genome resources (Continued) 



p 


VGF 


Neurosecretory protein VGF 


fragment 


S5C.90772 


Not Found 


100624333 


p 


VIP 


Vasoactive intestinal peptide 


complete 


Ssc.47759 


E0Y441 


100500718 


c 


7B2 


Neuroendocrine protein 7B2 (secretogranin 5) 


complete 


Ssc.155 


P01165 


397110 


c 


FURIN 


Furin 


complete 


Ssc.94009 


F1RMJ1 


100156882 


c 


PCSKI 


Proprotein convertase subtllisin/kexin type 1 PCI/3 


complete 


Ssc.92884 


Q28959 


397103 


c 


PCSK2 


Proprotein convertase subtllisin/kexin type 2 


complete 


Ssc.109 


Q03333 


445533 


c 


PCSK4 


Proprotein convertase subtllisin/kexin type 4 


complete 


Ssc47037 


Not Found 


100626523 


c 


PCSK5 


Proprotein convertase subtllisin/kexin type 5 


incomplete 


Ssc43614 


Not Found 


100519237 


c 


PCSK6 


Proprotein convertase subtllisin/kexin type 6 


incomplete 


Ssc.73551 


F1RZ92 


100152144 


c 


PCSK7 


Proprotein convertase subtllisin/kexin type 7 


complete 


Ssc.5628 


FISJTO 


100523009 


c 


PCSK9 


Proprotein convertase subtllisin/kexin type 9 


complete 


Ssc.84357 


Not Found 


100620501 



^ P: prohormone gene, C: prohormone convertase gene. 

^ Genome sequence found: complete or incomplete in the pig genome assembly, found in the Traces archive, or Not Found in any genome repository. 
'^'^''^ Identifiers in the UniGene, UniProt and Gene databases. 



thyroliberin peptide however, no pig EST has been 
reported. Only 8 genome predictions do not have support- 
ing evidence in UniProt with 5 of these have supporting 
transcript evidence. Complete protein sequences were 
predicted for insulin-like 5 {INSL5) and endothelin-2 
(EDN2) and an incomplete protein sequence was pre- 
dicted for prolactin-releasing peptide (PRRP) genes. The 
nucleic and protein sequence of EDN2 have been reported 
[22] but this information is not present in public data- 
bases. Of the 23 UniProt predicted sequences, 14 (7) en- 
tries have (do not have) transcript support. There are 2 
UniProt predicted sequences that correspond to genes 
that are absent in the pig genome studied. A partial match 
to UCNl was found in the trace archives although this 
gene was not found in the pig genome assembly studied 
and lacks of external validation. Progonadoliberin-2 
(G0N2) was not found in the current pig genome assem- 
bly although this gene was detected in earlier assembly 
versions and trace archives. This gene has no current ex- 
perimental evidence outside homology to other mamma- 
lian species. The apparent lack of UCNl and G0N2 in the 
assembly and fragment evidence of other prohormones is 
most likely due to poor coverage of the genomic regions 
where these prohormone genes are located. 

At least four calcitonin genes, also known as calcitonin 
receptor-stimulating peptide genes, were identified with 
2 genes exhibiting alternative splicing. The genome 
assembly permitted the assignment of the UniProt 
pre-procalcitonin (A6P7L6) and preprocalcitonin gene- 
related peptide (A6P7L7) entries to the same gene that 
also produces the UniProt calcitonin {CALC, [Swiss-Prot: 
P01259]) and calcitonin gene-related peptide {CALCA, 
[Swiss-Prot:P30880]) peptides, respectively. This alterna- 
tively spliced gene corresponds to CALCA gene found in 
other mammalian species. 

The bioinformatics gene prediction pipeline confirmed 
that the separate Q766Y6 and A0A761 UniProt entries 



are alternatively spliced variants of the same calcitonin 
receptor-stimulating peptide 3 {CRSP3) gene as initially 
reported by Rezaeian et al. [23]. A single genome 
match was identified for calcitonin receptor-stimulating 
peptide 2 {CRSP2, Q766Y7). While the calcitonin receptor- 
stimulating peptide 1 (CRSPl) gene has been reported, the 
actual gene may be inaccurately assembled in the 10.2 gen- 
ome release because the region appears to contain a small 
duplication leading to 2 starting locations. Further support- 
ing this argument, a conserved 19 amino acid region in 
all calcitonin receptor-stimulating peptide-related protein 
sequences matched to an additional 5th genome site 
which was part of a discontinued NCBI Gene entry (Gene 
ID 100624618). There is insufficient information to con- 
clude whether there is a separate coding gene involved or 
an assembly-related problem. 

Neuropeptide S (A/PS) is a potential 12th prohormone 
gene discovered by the bioinformatics gene prediction pipe- 
line. Two genome matches on different chromosomes for 
the NPS gene were identified. However, the complete iden- 
tity of the matched between the predicted sequence and 
chromosomal locations implied that this was an assembly 
error rather than a duplication event. Although UniProt 
has an partial pig NPS entry (B5M997), the genome pre- 
dicted protein sequence was more simOar to the bovine se- 
quence, as expected, than the present partial UniProt pig 
sequence. The pig NPS protein and nucleic sequences were 
found to be 100% and 99% identical, respectively, to the 
rabbit sequence ([GenBank:EU978456]). The simOarity be- 
tween the UniProt pig and rabbit sequences was also 
evident in the phylogenetic relationships among NPS 
sequences reported by Yao et al. [24]. These findings call 
into question the present pig NPS entry in UniProt. 

Prohormone genes previously unreported In pig 

The genome search identified 11 prohormone genes that 
do not have empirical confirmation in the UniProt, 
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Table 2 Distribution of the prohormone gene predictions across UniProt and UniGene resources 

UniProt evidence^ 



Genome'^ 


UniGene^ 


Protein 


Partial 


Transcript 


Predicted 


Not Found 


complete 


Present 


38 


7 


17 


14 


3 


complete 


Not Found 


0 


1 


1 


7 


2 


fragment 


Present 


1 


0 


1 


0 


2 


fragment 


Not Found 


0 


0 


0 


0 


1 


Not Found 


Not Found 


0 


0 


0 


2 


0 



^ UniProt Evidence: "type of evidence that supports the existence of the protein"; Protein : complete protein sequence; Partial: incomplete protein sequence such 
as presence of a peptide; Transcript: "existence of a protein has not been strictly proven but there is expression data (such as existence of cDNAs, RT-PCR or 
Northern blots) that indicate the existence of a transcript."; Predicted: Complete or partial sequence of the protein has been predicted; Not Found: no match 
found in the UniProt database. 

^ Genome: prediction of the protein sequence from the genome assembly: complete denotes full sequence, fragment denotes incomplete prediction and Not 
Found denotes no match. 

^ UniGene Present or Not Found denote whether the gene had any EST evidence or not, respectively. 



UniGene or Gene databases (Table 1). These genes are 
ADM2, CORT, INSL5, 0X26, PR0K2, PRRP, NFS, TIP39, 
UCNl, UCN2, UCN3, and UTS2B. Only inferred 
sequences are available for ADM2 and CORT in UniProt 
and the current pig NPS entry is identical to the rabbit 
NPS. There is evidence for mammalian homologs of all 
these genes in UniProt. The protocol followed to identify 
these genes included a high percentage of identities and 
similarities with a minimum percentage of mismatches 
and gaps and conservation of the region encompassing 
the potential neuropeptide. Intermedin or adrenomedullin 
2 (ADM2) is part of the calcitonin family of peptides and 
has effects similar to those of adrenomedullin {ADML). In 
humans, intermedin causes hypotension when given per- 
ipherally and augments blood pressure and causes sympa- 
thetic activation when given to the central nervous system 
[25]. This neuropeptide induces prolactin release, has anti- 
diuretic and natriuretic properties and reduces food intake. 
The amino acid sequences of CORT and somatostatin 
{SMS) are highly similar and both reduce neuronal activity. 
In addition, CORT has unique roles such as induction of 
slow-wave sleep, reduction of locomotor activity, and acti- 
vation of cation selective currents not responsive to SMS 
[26]. Although the function of INSL5 is still being deter- 
mined, high expression in the colon, as well as in the brain 
and hypothalamus, indicates roles in gut contractility and 
neuroendocrine signaling [27]. Likewise, the function of 
0X26 is still being elucidated, although studies in chicken 
confirm the orexigenic, appetite stimulating activity of this 
neuropeptide [28]. Takayanagi and Onaka [29] demon- 
strated that PRRP plays a role in control of energy 
metabolism and stress response. Prokineticins are 
involved in tumorigenesis process (prostate, testicles, 
neuroblastoma, colon, and pancreas) acting as a growth 
factor for cancer cells, an angiogenic and a chemotactic 
factor for pro-inflammatory neutrophils [30]. NPS has 
anxiolytic-like effects (stress reduction) and can induce 
arousal and wakefulness [31]. TIP39 and the correspond- 
ing receptor form a neuromodulator system and the 



anatomical distribution indicates a role in limbic, endo- 
crine, viscerosensory, and auditory functions. This system 
has been postulated as potential drug target in anxiety, de- 
pression and chronic pain management [32]. Urocortins 
and their receptors has been found in the central nervous, 
digestive, reproductive, cardiovascular, immune and endo- 
crine systems, suggesting a variety of roles including car- 
diovascular activity and cell survival [33]. UTSB2 is a 
paralog of urotensin 2 {UTS2) that exerts similar biological 
effects including relaxation of muscles and reduction of 
blood pressure [34] . 

Pig prohormone convertase genes 

The sequence of 8 prohormone convertase genes and 
the 7B2 facilitator gene also known as secretogranin 5 
(SCGS) were identified in the pig genome (Table 1). The 
UniProt and Gene databases only had supporting evi- 
dence for PCSKl, PCSK2, and 7B2. Six additional pro- 
hormone convertase genes ifurin, PCSK4, PCSKS, 
PCSK6, PCSK7, and PCSK9) that were previously unre- 
ported or not based on empirical evidence were identified. 
Only transcript evidence supports the 8 prohormone con- 
vertase genes, meanwhile protein evidence is available for 
7B2. Dai et al. [35] isolated PCSKl from the ovary cDNA 
library of a pregnant sow and Renegar et al. [36] detected 
PCSKl in the corpus luteum and brain of pregnant sows. 
Also, mRNA from PCSKl and PCSK2 has been identified 
in the pituitary neurointermediate lobes of pigs [37]. 
Among the prohormone convertases, furin, PCSK4, 
PCSKS, PCKS6 and PCSK7 do not have UniGene entries. 
The present catalogue enhances the currently limited 
work on pig prohormone convertases. 

Functional characterization of the pig prohormone and 
prohormone convertase genes 

Analysis of the large number of microarray gene expres- 
sion experiments enabled the first comprehensive 
characterization of the role of prohormone and prohor- 
mone convertase genes in biological processes in the pig. 
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The results from these analyses augmented the under- 
standing of the role of these genes on reproduction, 
health, growth, and other traits of importance to biomed- 
ical research and agricultural production. 

The query of Affymetrix Porcine Genome Array identi- 
fied 77 probes representing 56 prohormone and 3 prohor- 
mone convertase genes. Table 3 lists the total number of 
differentially expressed probes (P-value < 0.005) within the 
seven experimental classes considered. A detailed distribu- 
tion of the differential expression level of each probe and 
experiment is provided [see Additional file 3]. A discus- 
sion of the findings for the 3 groups with highest number 
of differentially expressed probes (immune-related, embryo 
and placenta, and brain and central nervous system) is pre- 
sented below. Although neuropeptides expressed in the 
brain and the immune system interact with circulating 
cytokines to support two-way communications between the 
brain and immune system [38], we describe the profiles of 
prohormones in immune-related tissues separately from 
the brain and central nervous system tissues to facilitate the 
interpretation of results. 

Immune-related profiling 

Several studies have demonstrated that prohormone genes 
play an important role in pig immune response [39] . This 
was evidenced by the high number of differentially 
expressed prohormone and prohormone convertase genes 
(24 genes) among experiments that evaluated immune- 
response in blood, spleen, and lymph nodes (Table 3). Dif- 
ferentially expressed genes were: ADML, augurin (AUGN), 
cholecystokinin {CCKM), CRSP3, endothelin-1 (EDNl), 
galanin {GALA), galanin-like peptide (GALP), progonado- 
liberin-1 (GONl), insulin-like growth factor I [IGFl), insu- 
lin-like growth factor II {IGF2), neuromedin-B (NMB), 
neuromedin-U (NMU), neuropeptide Y [NPY), platelet- 
derived growth factor subunit A (PDGFA), proenkephalin- 
A (PENK), prorelaxin 1 (RELl), secretogranin-1 (SCGl), 
secretogranin-2 (SCG2), secretogranin-3 (SCG3), SMS, 
vascular endothelial growth factor C {VEGFC), vascular 
endothelial growth factor D (VEGFD), PCSKl and PCSK7. 

In general, prohormone genes were under-expressed 
in pigs under immune challenge relative to the un- 
challenged controls. AUGN was differentially expressed in 
two experiments; GSE7313 [40] that profiled lymph nodes 
and GSE14790 [41] that profiled blood. In GSE14790, 
7 day-old pigs were inoculated with porcine circovirus 
type 2 (PCV2), a virus that is widely spread across pig 
farms, and gene expression was profiled at 0, 7, 14, 21 and 
29 dpi. AUGN was over-expressed in un-inoculated pigs 
at 29 dpi relative to 7 dpi, regardless of inoculation and 
relative 21 dpi inoculated pigs (P-value<2.5 x 10 *). Both 
contrasts indicate that the expression of AUGN increases 
with age and this trend is slower in pigs infected with 
PCV2. In GSE7313, the gene expression of seven week old 



piglets inoculated with Salmonella Typhimurium was pro- 
filed at 8 hours post inoculation (hpi), 24 hpi, 48 hpi, and 
21 days post inoculation (dpi). AUGN was over-expressed 
at 21 dpi relative to 24 hpi and 48 hpi (P-values < 6.8 x lO '' 
and 2.7 x 10'^, respectively). Consistent with the differential 
expression in relation to immune-response observed in this 
study, AUGN is a putative tumor suppressor gene and is 
down-regulated in many cancers [42]. 

IGF2, a member of the insulin family and is involved 
in development and growth, was differentially expressed 
across immune-related experiments. IGF2 was repre- 
sented by 12 probes in the microarray platform and 
6 probes were differentially expressed across experi- 
ments. Five probes (Ssc.9365.1.Sl_at, Ssc.9365.2.Sl_a_at, 
Ssc.9365.5.Al_at, Ssc.9365.5.Sl_at, Ssc.9365.6.Sl_x_at) 
and one probe (Ssc.9365.3.Sl_a_at) were differentially 
expressed in experiments GSE14790 [41] and GSE7314 
[43], respectively. In GSE14790, IGF2 was under-expressed 
in non-inoculated piglets at 7 dpi relative to inoculated pigs 
at various days (P-value < 1.7 x 10'^, fold change = 0.71). In 
experiment GSE7314, IGF2 was over-expressed in pigs 
inoculated with Salmonella choleraesuis at 21 dpi relative 
to non-inoculated pigs (P-value < 8.1 x 10 *). These results 
are consistent with reports that IGF2 is down-regulated in 
pigs immune-challenged with lipopolysaccharide [44] . 

SCGl, SCG2, SCG3, members of the secretogranin fam- 
ily, exhibited differential expression among immune- 
challenge experiments consistent with the known associ- 
ation of these genes with cell activation, cytotoxicity and 
microbial defense [45] . Probes on all 3 SCGs exhibited dif- 
ferential expression on two immune-related experiments. 
SCGl and SCG2 are differentially expressed in GSE14790 
while SCGS was differentially expressed in GSE11787 [21]. 
In GSE11787 SCG3 was under-expressed in inocu- 
lated pigs relative to controls (P-value < 1.2 x 10"^, fold 
change = 0.33). These results are consistent with the 
lack of synthesis of endogenous granins in rat PC12 cells 
infected with recombinant vaccinia viruses [46]. In 
GSE14790, SCGl and SCG2 were under-expressed in 
pigs inoculated with PCV2 relative to un-inoculated 
control pigs (P-value < 1.8 x 10'*, fold change = 0.88 and 
P-value < 1.5 x 10'^, fold change = 0.93, respectively). 

Two members of the vascular endothelial growth factor 
family, VEGFC and VEGFD, were under-expressed in 
PCV2 inoculated pigs relative to control pigs (P-value < 1.8 
X 10'^, fold change = 0.62) and also under-expressed at 
early stages (7 dpi) relative to later stages (19 and 29 dpi) 
in GSE14790. In agreement with these findings, a loss of 
endothelial growth factor transcription and increase in 
pro-inflammatory indicators were reported in the endo- 
metrial lymphocytes of pigs at sites of fetal arrest [47]. 

NPY (probe Ssc. 15981. l.Al at) was under-expressed 
in PCV2-inoculated pigs relative to control pigs 
(P-value < 6.6 x 10'*) and, within infection level, NPY was 
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Table 3 Differentially expressed prohormone and prohormone convertase genes (P-value< 0.005) across 35 microarray 
experiments by tissue class 



Symbol 


Probe' 


Imm."" 


Emb. 


CNS 


Repro. 


Muse. 


Fat 


Gut 


Total 


Prohormone 


ADM5 


Ssc.26627.1.Al_at 


0 


0 


1 


0 


0 


0 


0 


1 


ADML 


Ssc.314.1.S1_at 


2 


0 


1 


0 


1 


0 


1 


5 


ANF 


Ssc.l 6245.1. Sl_at 


0 


0 


0 


0 


1 


0 


0 


1 


ANFB 


Ssc.629.1.S1_at 


0 


0 


0 


0 


1 


0 


0 


1 


ANFC 


Ssc.23867.1.Al_at 


0 


1 


1 


0 


0 


0 


0 


2 


AUGN 


Ssc.22487.1.S1_at 


2 


0 


0 


0 


1 


1 


0 


4 


CART 


Ssc.l5900.1.S1_at 


0 


1 


1 


0 


0 


0 


0 


2 


CCKN 


Ssc.717.1.Sl_at 


1 


0 


0 


1 


1 


0 


0 


3 


CMGA 


Ssc.4653.1.S1_at 


0 


0 


0 


0 


0 


1 


1 


2 


COLI 


Ssc.14556.1.Sl_at 


0 


1 


0 


0 


0 


0 


0 


1 


CRSPl 


Ssc.3741.1.Sl_at 


0 


0 


0 


0 


0 


0 


0 


0 


CRSP2 


Ssc.l 85581. SI _at 


0 


1 


0 


0 


0 


0 


0 


1 


CRSP3 


Ssc.l 7879.1. Sl_at 


1 


0 


0 


0 


0 


0 


0 


1 


EDNl 


Ssc.9364.1.S1_at 


2 


0 


0 


0 


0 


1 


0 


3 


GALA 


Ssc.713.1.S1_at 


1 


1 


0 


0 


0 


0 


1 


3 


GALP 


Ssc.4875.1.S1_at 


1 


1 


0 


0 


1 


0 


0 


3 


CAST 


Ssc.6441.S1_at 


0 


1 


0 


0 


0 


0 


0 


1 


GHRL 


Ssc.440.1.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


GLUC 


Ssc.l7225.1.S1_at 


0 


1 


0 


1 


0 


0 


1 


3 


GONl 


Ssc.l6310.1.S1_at 


1 


1 


0 


0 


0 


0 


0 


2 


HEPC 


Ssc.376.1.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


lAPP 


Ssc.8324.1.A1_at 


0 


1 


0 


0 


0 


0 


0 


1 


IGFl 


Ssc.l 623 1.1.S1_a_at 


1 


0 


1 


0 


0 


0 


0 


2 




Ssc.l 623 1.2.A1_a_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.16231.3.S1_a_at 


0 


0 


1 


0 


0 


0 


0 


1 


IGF2 


Ssc.9365.1.S1_at 


1 


0 


0 


0 


0 


0 


0 


1 




Ssc.9365.2.S1_a_at 


1 


1 


0 


0 


0 


1 


0 


3 




Ssc.9365.3.S1_a_at 


1 


0 


0 


0 


0 


0 


0 


1 




Ssc.9365.3.S1_x_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.9365.4S1_a_at 


0 


1 


0 


0 


0 


0 


0 


1 




Ssc.9365.5.A1_at 


1 


0 


0 


0 


0 


0 


0 


1 




Ssc.9365.5.S1_at 


1 


1 


0 


0 


0 


0 


0 


2 




Ssc.9365.5.Sl_a_at 


0 


0 


0 


0 


0 


1 


0 


1 




Ssc.9365.6.A1_a_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.9365.6.A1_x_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.9365.6.S1_x_at 


1 


0 


1 


0 


0 


0 


0 


2 




Ssc.9365.7.A1_x_at 


0 


0 


0 


0 


0 


0 


0 


0 


INS 


Ssc.583.1.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


INSL3 


Ssc.l 1 990. 1.Sl_at 


0 


1 


0 


0 


0 


0 


0 


1 


MCH 


Ssc.3287.1.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


MOT! 


Ssc.7141.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


NEUl 


Ssc.l 56681. A1_at 


0 


0 


0 


0 


0 


0 


0 


0 


NEU2 


Ssc.4210.1.S1_at 


0 


0 


0 


0 


1 


0 


0 


1 


NMB 


Ssc.2083.1.A1_at 


1 


0 


0 


0 


0 


0 


0 


1 


NMU 


Ssc.l 25081. A1_at 


1 


0 


0 


0 


0 


0 


0 


1 
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Table 3 Differentially expressed prohormone and prohormone convertase genes (P-value< 0.005) across 35 microarray 
experiments by tissue class (Continued) 



NPW 


Ssc.l5796.1.Sl_at 


0 


1 


0 


0 


0 


0 


0 


1 


NPY 


Ssc.l5981.1.A1_at 


1 


1 


0 


0 


1 


0 


0 


3 




Ssc.l5981.1.S1_at 


0 


2 


1 


0 


0 


0 


0 


3 


OREX 


Ssc.l5983.1.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


PACA 


Ssc.27598.1.S1_at 


0 


1 


0 


0 


0 


0 


0 


1 


PAHO 


Ssc.456.1.Sl_at 


0 


1 


0 


0 


0 


0 


0 


1 


PCSKIN 


Ssc.l7429.1.S1_at 


0 


1 


0 


1 


0 


0 


0 


2 


PDGFA 


Ssc.6173.3.Sl_a_at 


1 


0 


1 


0 


0 


0 


1 


3 


PDYN 


Ssc.l21.1.Sl_at 


0 


1 


1 


0 


0 


0 


0 


2 


PENK 


Ssc.l 1281.1.A1_at 


0 


1 


0 


1 


0 


1 


1 


4 




Ssc.l 1281.2.Sl_at 


1 


1 


0 


0 


0 


0 


0 


2 


PNOC 


Ssc.l5910.1.A1_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.l5910.1.Sl_at 


0 


0 


0 


0 


0 


0 


0 


0 


PTHR 


Ssc.9991.1.Sl_at 


0 


1 


1 


2 


0 


0 


0 


4 


PTHY 


Ssc.668.1.S1_at 


0 


1 


0 


0 


0 


0 


0 


1 


RELl 


Ssc.l 62. l.Sl_at 


1 


1 


0 


0 


0 


0 


0 


2 


SCGl 


Ssc.l 571 8.1. A1_at 


1 


1 


0 


0 


0 


0 


1 


3 


SCG2 


Ssc.l 3645. 1.Al_at 


1 


0 


0 


0 


0 


1 


1 


3 


SCG3 


Ssc.6770.1.A1_at 


1 


1 


0 


1 


0 


0 


0 


3 


SECR 


Ssc.710.1.Sl_at 


0 


1 


0 


0 


0 


0 


0 


1 


SMS 


Ssc.l 9520.1. Al_at 


1 


1 


0 


0 


1 


0 


0 


3 


TKNi 


Ssc.l 8075. 1.A1_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.l 8075.2.S1_at 


0 


1 


0 


0 


0 


0 


0 


1 


TKN4 


Ssc.23 153.1. SI _at 


0 


0 


0 


0 


0 


0 


0 


0 


TKNK 


Ssc.l 9565.1. Sl_at 


0 


0 


0 


0 


0 


0 


0 


0 




Ssc.l 9565.2.A1_at 


0 


0 


0 


0 


0 


0 


0 


0 


UTS2 


Ssc.437.1.S1_a_at 


0 


1 


0 


0 


0 


0 


0 


1 


VEGFC 


Ssc.l 2790.1. A1_at 


1 


1 


1 


0 


1 


0 


1 


5 


VEGFD 


Ssc.29289.1.A1_at 


1 


1 


0 


0 


0 


0 


0 


2 


Total 




30 


35 


12 


7 


10 


7 


9 


110 


Prohormone Convertase 


PCSKl 


Ssc.141.1.S1_at 


1 


1 


0 


0 


0 


0 


1 


3 


PCSK2 


Ssc.109.1.S1_at 


0 


0 


0 


0 


0 


0 


0 


0 


PCSK7 


Ssc.5628.1.S1_at 


1 


1 


0 


0 


0 


0 


1 


3 


Total 




2 


2 


0 


0 


0 


0 


2 


5 


^Affymetrix microarray gene probe identifier. 

^ Experiment classes: Imm: primary immune-response tissues, Emb: embryo 


and placenta, CNS: brain 


and central 


nervous system, Repro: reproduction, M 


use: 



muscle, fat, and gut. 



under-expressed at earlier stages relative to 29 dpi in 
GSE14790. Consistent with these findings, the levels of 
NPY mRNA decreased in the blood of rats treated with 
vinblastine, an anti-cancer drug known to decrease the 
number of white blood cells of the immune system 
involved in defense [48]. Similarly, NPY was found to 
decrease in cattle infected with Bovine Spongiform 
Encephalopathy [49]. 

ADML was differentially expressed in GSE14758-D 
and GSE7314. In GSE14758-D [41], ADML was 



under-expressed in the mediastinal lymph nodes of 
PCV2-infected pigs relative to control pigs at 29 dpi 
(P-value < 1.7 X 10'^, fold change = 0.6). Whereas, in 
GSE7314 ADML was over-expressed at 48 dpi in the 
blood of pigs inoculated with Salmonella choleraesuis rela- 
tive to controls (P-value 4.8 x 10'^). The latter result is 
consistent with the up-regulation of ADML gene expres- 
sion and increases in systemic circulatory concentra- 
tions of ADML in response to the onset and progression 
of trauma, infection, and sepsis [50]. The apparent 
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inconsistency between both experiments may be asso- 
ciated with the differential effects that ADML has on cel- 
lular metabolism, immune function, endocrine function, 
and cardiovascular function. 

Of the 3 prohormone convertases available in the 
microarray platform, PCSKl and PCSK7 were signifi- 
cantly differentially expressed (P-value < 1.3 x 10' ) and 
PCSK2 was marginally significantly differentially expressed 
(P-value < 6.5 X 10'^) in GSE14790. PCSKl was under- 
expressed in PCV2-inoculated pigs already at 7 dpi relative 
to 29 dpi, regardless of inoculation at the later stage 
(P-value < 5.7 X 10'^). Likewise, PCSK7 is under-expressed 
in PCV2-inoculated pigs relative to controls already at 
7 dpi (P-value < 4.2 x 10'*) and, within controls, PCSK7 
was under-expressed at early stages (7 dpi, 21 dpi) relative 
to 29 dpi (P-value < 4.1 x 10'*). These results are m agree- 
ment with similar findings that furin, another prohormone 
convertase, was dysregulated in the immune cells of 
advanced human atherosclerotic plaques [51] and imply 
that prohormone convertase, like prohormone genes, are 
down regulated under immune challenges. 

Embryo and placenta profiling 

In GSE18641 [52], IGF2 (probe Ssc.9365.2.Sl_a_at) was 
over-expressed in pregnant sows relative to non- 
pregnant sows (P-value < 2.7 X 10-3, fold change 1.23). 
In GSE12705 [20], IGF2 (probes Ssc.9365.4.Sl_a_at and 
Ssc.9365.5.Sl_at) was over-expressed in earlier stages 
(day 11 spherical and day 11 and 12 tubular) relative to 
later stage (day 12 and 14 filamentous) conceptuses 
(P-value < 2.1 x 10'*). This IGF2 profile is supported by 
Pantaleon et al. [53] that showed that IGF2 is needed in 
order for mouse embryos to progress from early stages to 
blastocyst stages. Gupta et al. [54,55] reported that the ex- 
pression of the embryo survival related gene IGF2 increased 
with the addition of nonessential amino acids or phytohe- 
maglutinin in pig embryos and blastocysts, respectively. 

Both PENK probes were over-expressed in tubular and 
spherical conceptuses relative to filamentous conceptuses 
(P-value < 2.6 X lO''') in experiment GSE12705 [20]. This is 
consistent with results that found PENK mRNA to increase 
linearly during gestation in the hippocampus of pigs [56]. 
PTHR was under-expressed in tubular and spherical rela- 
tive to filamentous conceptuses (P-value < 6.4 x 10' , fold 
change = 0.02) in experiment GSE12705. This finding is 
supported by reports that PTHR is present in higher con- 
centrations in fetal pigs than in sows [56]. VEGFC is a rep- 
resentative of the vascular endothelial growth factor family 
of prohormones that have an important role in the survival 
and mitogenesis of endothelial cells and lymphaniogenesis 
and angiogenesis of embryos [57]. VEGFC was over- 
expressed in pregnant sows relative to non-pregnant sows 
(P-value < 7.8 X 10'*) in experiment GSE18641 [52]. This 
finding is supported by a study in the chicken, 



demonstrating that the chorioallantoic membrane (analo- 
gous to the placenta in mammals) contained growth of 
embryonic microvessels stimulated by VEGFC [58]. The 
expression profile is also supported by the finding that in 
mice embryos, VEGFC is required for successful lymphatic 
vasculature development and lymphatic endothelial cell 
migration [59]. 

Brain and central nervous system 

Eleven differentially expressed prohormone genes were 
identified in experiments concerning the hypothalamus, 
thyroid, and olfactory bulb (neuroblasts). These genes are 
AdrenomedulUn-5 {ADMS), ADML, C-type natriuretic 
peptide {ANFC), cocaine and amphetamine regulated 
transcript protein (CART), IGFl, IGF2, NPY, platelet- 
derived growth factor subunit A (PDGFA), prodynorphin 
(PDYM), PTHR, and VEGFC. 

ADML was over-expressed in the immortalized porcine 
olfactory bulb neuroblasts relative to the non-neural epi- 
thelial cells (P-value < 2.2 x 10'^, fold change > 10) in ex- 
periment GSE16855 [60]. This result is supported by a 
previous study that found that ADML is important for 
regulation of proliferation and differentiation of neural 
stem/progenitor cells using the mouse olfactory bulb [61]. 

IGFl was over-expressed in the neuroblasts relative to 
non-neural epithelial cells (average P-value < 5 x 10'^, 
fold change > 10) in experiment GSE16855. This result is 
supported by a study in chickens showing that IGFl was 
expressed in the olfactory bulb [62]. Also, IGF2 (probe 
Ssc.9365.6.Sl_x_at) was consistently over-expressed in 
the hypothalamus of male Iberian pigs relative to all 
other seven breed-gender combinations (on average, 
P-value < 2.3 x 10' , fold change = 2.42) in experiment 
GSE14739-H [63,64]. NPY was over-expressed (P- 
value < 8.1 x 10-4, fold change = 7.94) in neuroblasts 
relative to non-neuronal cells in GSE16855. This re- 
sult is consistent with reports that the olfactory bulb 
exhibit high levels of immunoreactive NPY in the 
brain of pigs [65] and that NPY may inhibit excitatory 
neurotransmission in the rat olfactory bulb [66]. 
VEGFC was over-expressed in neuroblasts relative to 
non-neuronal cells (P-value < 1.5 x 10' , fold change > 10) 
in experiment GSE16855. This result agrees with a 30% 
increase in dividing neuroblasts in olfactory bulb in cul- 
ture stimulated with VEGFC compared to controls 
reported by Le Bras et al. [67]. PTHLH was under- 
expressed in neuroblasts relative to non-neuronal cells 
(P-value < 2.6 x 10'*, fold change = 0.20) in GSE16855. This 
finding is consistent with reports that PTHLH may be a 
negative regulator in the differentiation of chondrocytes 
[68]. PDGFA was over-expressed in neuroblasts relative to 
non-neuronal cells (P-value < 1.2 x 10'*) in experiment 
GSE16855. Related to this result, Fressinaud et al. [69] 
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reported that platelet-derived growth factors increase the 
glutamine synthetase activity in astrocytes in the brain. 

Prediction of cleavage sites in pig prohormones 

All 97 prohormone sequences were used to predict 
cleavage and confirm the prediction against known or 
predicted cleavage sites. These sequences were inferred 
to have 228 cleavage sites that resulting in a 14.6% 
prevalence rate (proportion of possible sites that are 
cleaved). Most sites were cleaved at an arginine (R) such 
that the most frequently cleaved motifs were xxKR 
(71%), RxxR (34%) and xxRR (41%), where x denotes any 
amino acid and K denotes lysine. There were 5% (38) 
C-terminal single R sites that were cleaved without a basic 
amino acid in the second and fourth positions preceding 
the cleavage site (P2 or P4 locations, respectively). 

The performance of the cleavage prediction models is 
presented in Table 4. The correct classification rate ran- 
ged from 82% to 92% indicating that a large proportion 
of the sites were accurately predicted across all models. 
The human cleavage prediction models had the best per- 
formance for most of the statistics followed by the mam- 
malian model. The Known Motif model provided the 
highest number of true positive predictions but also the 
highest number of false positive predictions. The Known 
Motif model provided the highest sensitivity, 77%, indi- 
cating more than three quarters of the cleaved sites were 
correctly predicted as cleaved. However this model also 
provided the highest number of false positive predic- 
tions. Consequently the Known Motif positive predictive 
power was 35% indicating that, on average, only 35% of 
sites predicted to be cleaved are expected to be true 
cleavage sites. 

The human models provided the highest number of 
true negatives resulting in the best model performance 
compared to the Known Motif and mammalian models. 
The human artificial neural network models had ap- 
proximately 60% positive predictive power indicating 
that most sites predicted as cleaved are expected to be 
true positives. Although the human logistic models had 
lower sensitivity than their artificial neural network 
counterparts, the differences with the human artificial 
neural network model were only 4 cleaved and 11 non- 
cleaved sites. The high performance of the human mod- 
els suggests that the cleavage of prohormones that result 
in potential biologically active neuropeptides in the pig 
is similar to humans. Noteworthy is that the mammalian 
model was trained on 51 mammalian prohormones that 
included 8 pig prohormones. This model provided 
slightly more true positive predictions and a higher sen- 
sitivity than the human logistic model. However, the 
mammalian model had noticeably more false positive 
predictions than the human logistic model resulting in 
lower performance in the other accuracy measures. 



The comparison of results across models also provides 
information on the accuracy of the cleavage assignment, 
prediction accuracy and potential for a gene to produce 
bioactive peptides. For 10 prohormones, at least 5 of the 
models did not predict any cleavage site. However, it must 
be noted that 4 of the prohormones (ANF, GHRL, IGFl 
and PDGFD) are likely to have sites cleaved by proteases 
other than prohormone convertases. For example, ANF is 
cleaved by corin, serine peptidase (CORIN) [70]. 

Genes with no predicted cleavage or assigned cleavage 
that differ from other species can be used to identify 
proteins are not cleaved to form smaller peptides. There 
is no evidence for cleavage of UCN2 and UCN3 to pro- 
duce mature peptides in mammals [71]. All models 
failed to predict two cleavage sites in T0R2X. The first 
site, an N-terminal dibasic 'RK', is known to be rarely 
cleaved across species [72]. The second site is a cleavage 
found in humans that forms alpha- and beta-salusin but 
this site may not be cleaved in the pig since the pig se- 
quence, like the bovine sequence, only has a single basic 
site instead of the human dibasic 'RR' site. Similarly for 
TKN4, the genomic prediction and supporting EST data 
indication a change from an R in other species to a gly- 
cine amino acid in the pig sequence that may prevent 
the formation of the 'Hemokinin' peptide. 

The pig CRSP2 protein sequence lacks the 'KR' and a 
C-terminal cleavage site that are cleaved in human 
CALC and CALCB genes to produce Calcitonin gene- 
related peptide 1 and Calcitonin gene-related peptide 2. 
Therefore it is unlikely that pig CRSP2 would provide 
these calcitonin peptides. The assigned cleavages in the 
RES18 protein are necessary to provide a potential 
triskadecapeptide peptide reported by Bloomquist et al. 
[73]. This potential peptide has flanking dibasic cleavage 
sites in the mouse and rat but this peptide has not been 
experimentally confirmed. The corresponding region in 
human, bovine and pig sequences are monobasic and 
lack common PC cleavage motifs suggesting that these 
species probably cannot form this peptide. 

Examination of the potential cleavage sites in lAPP indi- 
cated that a mutation from R to Q in the N-terminal 
cleavage site is necessary to produce the Islet amyloid 
polypeptide. Examination of the corresponding ESTs indi- 
cated that 2 swine ESTs ([GenBank:AJ649149] and 
[GenBank:A}649469]) were 100% identical to the rat gen- 
ome and consequently invalid sequences. Two other ESTs 
([GenBank:EW569366], [GenBank:BF712755]) matched 
the region that supported the genomic prediction. The 
predicted protein sequence including the potential clea- 
vages sites of the expected lAPP was less than 80% identi- 
cal to other mammals sequences compared to typically 
over 85% identity between the human and most other 
mammalian sequences. Potter et al. [74] questioned the 
capability of lAPP to form amyloids after examining the 
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Table 4 Performance of various cleavage prediction models to predict cleavage in pig prohormones 



Performance 


Known 


Mammalian 


Human 


Logistic 


Human 


ANN"' 


Criteria^ 


Motif 


Logistic 


A A b 

AA 


AA Prop." 


AA 


AA Prop. 


True Positives 


181 


165 


160 


158 


164 


167 


True Negatives 


1520 


1640 


1724 


1670 


1735 


1747 


False Positives 


329 


209 


125 


179 


114 


102 


False Negatives 


54 


70 


75 


77 


71 


68 


Correct Classification 


0.8162 


0.8661 


0.904 


0.8772 


0.9112 


0.9184 


Sensitivity 


0.7702 


0.7021 


0.6809 


0.6723 


0.6979 


0.7106 


Specificity 


0.8221 


0.887 


0.9324 


0.9032 


0.9383 


0.9448 


Positive predictive power 


0.3549 


0.4412 


0.5614 


0.4688 


0.5899 


0.6208 


Negative predictive power 


0.9657 


0.9591 


0.9583 


0.9559 


0.9607 


0.9625 


Correlation 


0.4358 


0.4856 


0.5645 


0.4944 


0.5919 


0.6184 


AUC 


0.8006 


0.847 


0.86 


0.8186 


0.8589 


0.8802 



^ Performance criteria. True positives: number of correctly predicted cleaved sites; True negatives: number of correctly predicted non-cleaved sites; False positives: 
number of incorrectly predicted cleaved sites; False negatives: number of incorrectly predicted non-cleaved sites; Correct classification rate: number of correctly 
predicted sites divided by the total number of sites; Sensitivity (one minus false positive rate): number of true positives divided by the total number of sites 
cleaved; Specificity (one minus false negative rate): number of true negatives divided by the total number of sites not cleaved; Positive predictive power: number 
of true positives divided by the total number of sites predicted to be cleaved; Negative predictive power: number of true negatives divided by the total number 
of sites predicted to not be cleaved; Correlation coefficient: Mathew's correlation coefficient between observed and predicted cleavage; and AUC: Area under the 
receiver operator characteristic or ROC curve relating sensitivity and 1 -specificity. 
^ AA: models trained only on amino acids. 

AA prop: models trained with amino acids combined with the physicochemical properties of amino acids. 

ANN: artificial neural network approach. 



functionality of a synthesized pig sequence based on the 
[GenBank:BF712755] EST sequence. The predicted pro- 
hormone sequence and cleavage prediction results also 
strongly suggest that the pig is unlikely to be able to form 
lAPP. This reflects the importance of proteomic studies 
involving cleavage to first determine that a species can 
produce a peptide. 

Conclusions 

The pig is an important biomedical and agricultural re- 
search species. Results from the first genome-wide study 
of pig prohormone and prohormone convertase genes, 
functional annotation and prediction of prohormone 
cleavage have been presented. This study was enabled by 
the availability of the pig genome sequence and of 35 
gene expression experiments that evaluated a wide range 
of conditions in pigs. These results offer more insights 
into the role of neuropeptides on biological processes 
such as reproduction, development, growth, and health 
and support targeted empirical confirmation. The bio- 
informatics pipeline used in this study can be used to 
identify prohormones or other sets of genes in species 
with similar sequence resources. Confirmatory insight 
into the pig prohormones can be expected from prote- 
omic mass spectrometry studies. 

Combining complementary bioinformatic resources, 
95 prohormone genes, 8 prohormone convertases and 
one cleavage facilitator were discovered in the pig gen- 
ome and raw sequence repositories. We uncovered 11 
prohormone genes that have not been previously reported 
and one potentially incorrectly reported. The high per- 
formance of the models used to predict cleavage in the pig 
prohormones suggests that the prohormone cleavage in 
pigs is similar to humans. The analysis of 35 gene 



expression experiments identified various neuropeptide 
genes differentially expressed in immune-related tissues, 
embryo and placenta and the central nervous system in- 
cluding AUGN, IGF2, the family of SCGs, NPY, ADM and 
ADML, PENK, PTHR, and VEGFC. Experiments are 
required to confirm that the pig does not produce the bio- 
active neuropeptides UCN2, UCN3, T0R2X, TKN4, lAPP 
and CRSP2 as suggested by the cleavage prediction 
models. 

Methods 

Search for pig prohormone and convertase genes 

A registry of approximately 100 candidate mammalian 
prohormone and convertase genes was built from public 
sequence databases (including NCBI Gene -release date 
September 2011 [75], UniGene [76] - release date April 
13 2011, and UniProt [77]-release 2011_11 November 
16, 2011) and a literature review [11-13,19,78-80]. 

Candidate genes were searched for in the pig genome 
10.2 assembly using the bioinformatics pipeline devel- 
oped by Southey et al. [11,13]. The protein sequence of 
each candidate gene in the registry was searched on the 
pig genome assembly using the TBLASTN, BLASTP and 
BLASTN programs from NCBI BLAST (version 2.18) 
[81] with default parameters settings (E-value < 10 and 
BLOSUM62 scoring matrix) and filtering disabled. In 
addition, sequences not used in the pig genome assem- 
bly (including unassigned genomic regions, whole gen- 
ome shotgun sequencing and trace archives) were 
searched when there was no suitable BLAST match to a 
candidate gene or when the alignment to the genome as- 
sembly suggested a missing genomic region. This strat- 
egy allowed the annotation of genomic regions that were 
partly or not included in the assembly. 
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The BLAST matches were examined based on the 
aUgnment score and E-value to identify the most likely 
matches and genomic location of the corresponding pro- 
hormone. The identified pig genomic region that 
encompassed the BLAST match was further extended 
approximately 500 base pairs to the 5' and 3' ends of 
the match. Matches were also screened for alignments 
to multiple homologous prohormone genes that could 
indicate gene duplication events in the pig genome. The 
gene parsing tool Wise2 [82] was used to predict the 
protein sequence within the genome regions detected 
with BLAST. The genomic region was further extended 
when only a partial protein sequence was predicted. In 
this study, Wise2 compared the target protein (prefer- 
ence was given to pig protein sequences, followed by 
human, cattle and other mammals) to the pig genomic 
DNA sequence identified by BLAST to infer the gene 
structure based on a model that includes introns and 
frameshift errors. Each predicted gene was compared to 
the UniProt and NCBI Gene databases to assess the ac- 
curacy of the prediction based on previously reported 
pig genes. To further confirm the Wise2 predictions, the 
protein sequence predicted from the gene model was 
also compared to the corresponding published mamma- 
lian sequences using the multiple sequence alignment 
tool Clustalw [83]. The multiple sequence alignment 
maximized the likelihood of identifying homologous 
genes. The predicted sequences were also searched 
against the pig entries in the NBCI EST database to con- 
firm the presence of the predicted protein sequence. The 
pig entries in the NCBI EST database was also used to 
complete the protein sequence when the genome cover- 
age was incomplete. 

Functional annotation of the pig prohormone and 
convertase genes 

A review of the pig microarray gene expression experi- 
ments available in the NCBI GEO database [84] indi- 
cated that the Affymetrix Porcine Genome Array 
GPL3533 [85] was the most commonly used platform. 
The UniGene database was searched for sequences that 
represent prohormone and prohormone convertase 
genes. This information was used to identify the probes 
representing prohormone and prohormone convertase 
genes in the Affymetrix Porcine Genome platform. 

Thirty-five experiments that used the Affymetrix Por- 
cine Genome platform were identified in GEO. Selected 
experiments had a minimum of 6 microarrays and a max- 
imum of 80 microarrays. The sources and main features 
of these experiments are provided [see Additional file 4]. 
The wide range of selected microarray experiments avail- 
able supported a comprehensive characterization of the 
association of prohormone and associated neuropeptide 
and convertase genes with various biological processes. 



The experiments were grouped into 7 classes: primary 
immune-response tissues, embryo and placenta, brain 
and central nervous system, reproduction, muscle, fat, 
and gut. For experiments encompassing multiple tissues 
(GSE14739, GSE18359, GSE13528), the samples corre- 
sponding to each tissue were grouped and analyzed sep- 
arately to facilitate the interpretation of results. The 
number of GEO experiments in each within each class 
were: immune: 6 (GSE7313, GSE7314, GSE11787, GSEl 
7492, GSE14758-mediastinal lymph nodes, and 
GSE14790); embryo and placenta: 5 (GSE18467, GSEl 
8641, GSE18343, GSE11853, and GSE12705); brain and 
nervous system: 5 (GSE16855, GSE12604, GSE14739- 
hypothalamus, GSE14739-thyroid, and GSE14739- 
adenohypophsis); reproduction: 2 (GSE11590, and 
GSE14739-gonads); muscle: 7 (GSE18653, GSE19275, 
GSE8974, GSE14643, GSE15211, GSE21096, and GSE 
16348-skeletal muscle); fat: 8 (GSE17309, GSE14373, 
GSE14739-fat, GSE9333, GSE18359-fat, GSE18359-liver, 
GSE13528-fat and GSE13528-liver); gut: 2 (GSE14357 and 
GSE15256). 

The gene expression data were pre-processed and nor- 
malized using the Affy R package [86]. Steps included 
the log-2 transformation and GC-robust multichip aver- 
age normalization of the gene expression measurements. 
All probes in the platform were analyzed using ANOVA 
to identify those that exhibited differential expression 
across the conditions studied. The false discovery rate 
[87] approach was used to adjust the statistical signifi- 
cance of the differential expression and account for mul- 
tiple testing across all probes. The normalization, one or 
two-way ANOVA and multiple test adjustment of the 
results were done using Beehive [88]. 

Prediction of cleavage sites 

The location of the cleavage in pig prohormone proteins 
that would result in potentially active neuropeptides was 
predicted using NeuroPred [72]. Complete prohormone 
sequences from UniProt were used to predict cleavage in 
preference to the predicted sequences. In limited cases, 
EST sequences were combined with the genomic data 
and published partial sequences to predict the complete 
prohormone sequence. For example, for Chromogranin-A 
{CMGA), three glutamic acids were missing in the 
genome-based predictions that were present in the corre- 
sponding UniProt fragment sequence ([Swiss-Prot: 
P04404]) and EST sequence [GenBank:EW261315] per- 
mitted the prediction of the complete pig CMGA protein 
sequence. The location of the potential cleavage sites in 
the pig prohormones were inferred by homology to 
human data. 

Complementary cleavage prediction models trained on 
confirmed cleavages from mammalian sequences [12,78,79] 
were used to predict cleavages in the pig prohormone 
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sequences. These models included the known motif model 
that searches for sites with specific combinations of basic 
amino acid associated to cleavages reported in other species 
[78], mammalian logistic regression [79], and human logis- 
tic regression and artificial neural network models based on 
amino acids only or amino acids combined with the physi- 
cochemical properties of amino acids [12]. 

Known or predicted cleavage sites on all 97 prohor- 
mone sequences were used to assess the performance of 
the models to predict cleavage. The "observed" cleavage 
sites known or inferred from homology to other species 
based on a literature search [11-13,19,78-80] were com- 
pared to the cleavage sites predicted by the models. The 
counts of the true positives (number of correctly pre- 
dicted cleaved sites), true negatives (number of correctly 
predicted non-cleaved sites), false positives (the number 
of incorrectly predicted cleaved sites) and false negatives 
(number of incorrectly predicted non-cleaved sites) or 
functions of the counts were used to assess the model 
performance. These measurements were used to com- 
pute the correct classification rate (number of correctly 
predicted sites divided by the total number of all sites), 
sensitivity (number of true positives divided by the total 
number of cleaved sites), specificity (number of true 
negatives divided by the total number of non-cleaved 
sites), positive predictive power (number of true posi- 
tives divided by the total number of sites predicted to be 
cleaved), negative predictive power (number of true nega- 
tives divided by the total number of sites predicted to not 
be cleaved), Mathews correlation coefficient between 
observed and predicted cleavage. The area under the re- 
ceiver operator characteristic or ROC curve relating sensi- 
tivity and 1 - specificity [78] was also calculated where area 
values lower than 0.7 indicate poor model performance. 
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